Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all characters in runfile source and target paths #23331

Closed
wants to merge 10 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Aug 19, 2024

Adds support for spaces and newlines in source and target paths of runfiles symlinks to build-runfiles as well as to the Bash, C++, and Java runfiles libraries (the Python runfiles library has moved out of the repo).

If a symlink has spaces or newlines in its source path, it is prefixed with a space in the manifest and spaces, newlines, and backslashes in the source path are escaped with \s, \n, and \b respectively. This scheme has been chosen as it has the following properties:

  1. There is no change to the format of manifest lines for entries whose source and target paths don't contain a space. This ensures compatibility with existing runfiles libraries.
  2. There is even no change to the format of manifest lines for entries whose target path contains spaces but whose source path does not. These manifests previously failed in build-runfiles, but were usable on Windows and supported by the official runfiles libraries. This also ensures that the initialization snippet for the Bash runfiles library doesn't need to change, even when used on Unix with a space in the output base path.
  3. The scheme is fully reversible and only depends on the source path, which gives runfiles libraries a choice between reversing the escaping when parsing the manifest (C++, Java) or applying the escaping when searching the manifest (Bash).

Fixes #4327

RELNOTES: Bazel now supports all characters in the rlocation and target paths of runfiles and can be run from workspaces with a space in their full path.

@fmeum fmeum force-pushed the 4327-spaces-in-runfiles branch 2 times, most recently from d4842e1 to 8a03d1b Compare August 19, 2024 11:38
@fmeum fmeum requested a review from lberki August 19, 2024 20:54
@fmeum fmeum marked this pull request as ready for review August 19, 2024 20:54
@fmeum fmeum requested a review from a team as a code owner August 19, 2024 20:54
@fmeum fmeum requested review from gregestren, a team and katre and removed request for a team, gregestren and katre August 19, 2024 20:54
@github-actions github-actions bot added team-Configurability platforms, toolchains, cquery, select(), config transitions team-Rules-Java Issues for Java rules team-Rules-CPP Issues for C++ rules awaiting-review PR is awaiting review from an assigned reviewer labels Aug 19, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Aug 19, 2024

@lberki I assigned you as reviewer for your general knowledge of runfiles, but feel free to redirect if needed.

@fmeum
Copy link
Collaborator Author

fmeum commented Aug 19, 2024

CC @meteorcloudy for Windows

Copy link
Member

@meteorcloudy meteorcloudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is great!

src/main/tools/build-runfiles-windows.cc Show resolved Hide resolved
@@ -166,18 +166,6 @@ private DetailedExitCode checkCwdInWorkspace(EventHandler eventHandler) {
}

Path workspacePath = workspace.getWorkspace();
// TODO(kchodorow): Remove this once spaces are supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the runfiles path the only reason we didn't support space in workspace path? /cc @lberki

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see missing escaping in the test setup shell script as one other source of issues. When the rest of this PR and the general approach look good to you, I could run Bazel's test suite under a path with a space to shake out these issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @fmeum 's approach is reasonable; spaces in workspace paths is the kind of thing that could break in unexpected places. Yes, we should properly quote everywhere, but that doesn't mean that we do if the functionality is untested.

@sgowroji
Copy link
Member

Hi @meteorcloudy, Since i can see that this PR has been approved, Please let us know whether gTech team should proceed with importing it. Thanks!

@fmeum
Copy link
Collaborator Author

fmeum commented Aug 21, 2024

@sgowroji I would prefer to wait for a second opinion, for example by @lberki.

@matthewjh
Copy link

@fmeum: is the aim to get this into Bazel 8.0.0?

@lberki
Copy link
Contributor

lberki commented Sep 19, 2024

In principle, adding the functionality to encode spaces in runfiles manifests is fine. I'm not sure, though, how I feel about this particular one because it's tailored to spaces and only spaces. How about implementing a scheme that could be more easily extended?

Spitballing, something like this: if a line in the runfiles manifest starts with a space, the algorithm to parse the line is as follows: the separator is between the two paths is a space, end of the line is a newline, any other character is a file name. If a file name contains a space, a newline or a backslash, these need to be escaped as \s, \n or \b, respectively, any other backslash sequences are reserved for future extension.

This is easy to implement, provably supports everything is just as incompatible with the existing system and looks more conventional than the encoding proposed in this pull request.

@fmeum WDYT? It looks like if we go through the pain of adding syntax to runfiles manifests, we could as well make it future-proof. I could also be convinced to make them some subset of JSON so that parsing them is possible with standard libraries but also with random logic so that one doesn't need to add additional dependencies.

@fmeum
Copy link
Collaborator Author

fmeum commented Sep 20, 2024

This is easy to implement, provably supports everything is just as incompatible with the existing system and looks more conventional than the encoding proposed in this pull request.

It lacks one important property, but we can also fix that: With this scheme in its current form, the Bash runfiles setup snippet as well as third-party runfiles libraries would need to be modified to continue to support manifest entries in which the rlocationpath contains no spaces, but the target path does (common on Windows).

We can fix that by not escaping spaces in the target paths and declaring that the first space on a line that doesn't start with a space is the separator. We would then only add a space at the beginning of a line if either the rlocationpath contains a space or newline or if the target path contains a newline.

At that point it's similar to the scheme in this PR, it just uses escape characters instead of length encoding. I agree that escaping is more conventional and probably not much harder to implement in most languages (it does change the length of the strings vs. just taking a substring, but that seems okay).

@lberki What do you think, should I go ahead and implement the modified version of your scheme?

@fmeum fmeum force-pushed the 4327-spaces-in-runfiles branch 2 times, most recently from 605c2b5 to 34d1f84 Compare September 20, 2024 12:05
@fmeum
Copy link
Collaborator Author

fmeum commented Sep 20, 2024

I implemented the scheme without support for newlines for now. They could always be supported in a follow-up, which is much simpler with the new scheme (just introduce a new escape character everywhere).

@fmeum fmeum force-pushed the 4327-spaces-in-runfiles branch 2 times, most recently from 9a141b6 to 0d7a3cc Compare September 20, 2024 12:35
@lberki
Copy link
Contributor

lberki commented Sep 23, 2024

The modified scheme looks reasonable. I'm travelling today so I can't review the pull request but will do so as soon as I can.

The main question in my mind is whether we do some sort of custom encoding (like this one) or go with a standard one. Of the standard ones, JSON seems to be the most palatable given the use case (simple list of key/value pairs intended to be parseable without any dependencies), but I'm not sure if it's worth the hassle and if it is, we are probably better off having an additional file called MANIFEST.json or something like that

@fmeum fmeum force-pushed the 4327-spaces-in-runfiles branch 2 times, most recently from 8775fde to 818e9af Compare October 4, 2024 08:39
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 4, 2024

@Wyverald Done

@fmeum fmeum force-pushed the 4327-spaces-in-runfiles branch from 818e9af to b621e65 Compare October 4, 2024 08:43
@fmeum fmeum force-pushed the 4327-spaces-in-runfiles branch from b621e65 to 4629915 Compare October 4, 2024 08:43
@copybara-service copybara-service bot closed this in 7407cef Oct 7, 2024
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Oct 7, 2024
phst added a commit to phst/rules_elisp that referenced this pull request Oct 8, 2024
Bazel supports them with bazelbuild/bazel#23331.

Step 1: support them for directory-based runfiles.  This doesn’t require any
code changes, just adapt the test.
phst added a commit to phst/rules_elisp that referenced this pull request Oct 8, 2024
Bazel supports them with bazelbuild/bazel#23331.

Step 1: support them for directory-based runfiles.  This doesn’t require any
code changes, just adapt the test.
phst added a commit to phst/rules_elisp that referenced this pull request Oct 8, 2024
Bazel supports them with bazelbuild/bazel#23331.

Step 1: support them for directory-based runfiles.  This doesn’t require any
code changes, just adapt the test.
phst added a commit to phst/rules_elisp that referenced this pull request Oct 8, 2024
Bazel supports them with bazelbuild/bazel#23331.

Step 2: support them for manifest-based runfiles.
phst added a commit to phst/rules_elisp that referenced this pull request Oct 8, 2024
Bazel supports them with bazelbuild/bazel#23331.

Step 2: support them for manifest-based runfiles.
phst added a commit to phst/rules_elisp that referenced this pull request Oct 8, 2024
Bazel supports them with bazelbuild/bazel#23331.

Step 1: support them for directory-based runfiles.  This doesn’t require any
code changes, just adapt the test.
phst added a commit to phst/rules_elisp that referenced this pull request Oct 8, 2024
Bazel supports them with bazelbuild/bazel#23331.

Step 2: support them for manifest-based runfiles.
@fmeum fmeum deleted the 4327-spaces-in-runfiles branch October 8, 2024 07:00
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 9, 2024
Adds support for spaces and newlines in source and target paths of runfiles symlinks to `build-runfiles` as well as to the Bash, C++, and Java runfiles libraries (the Python runfiles library has moved out of the repo).

If a symlink has spaces or newlines in its source path, it is prefixed with a space in the manifest and spaces, newlines, and backslashes in the source path are escaped with `\s`, `\n`, and `\b` respectively. This scheme has been chosen as it has the following properties:
1. There is no change to the format of manifest lines for entries whose source and target paths don't contain a space. This ensures compatibility with existing runfiles libraries.
2. There is even no change to the format of manifest lines for entries whose target path contains spaces but whose source path does not. These manifests previously failed in `build-runfiles`, but were usable on Windows and supported by the official runfiles libraries. This also ensures that the initialization snippet for the Bash runfiles library doesn't need to change, even when used on Unix with a space in the output base path.
3. The scheme is fully reversible and only depends on the source path, which gives runfiles libraries a choice between reversing the escaping when parsing the manifest (C++, Java) or applying the escaping when searching the manifest (Bash).

Fixes bazelbuild#4327

RELNOTES: Bazel now supports all characters in the rlocation and target paths of runfiles and can be run from workspaces with a space in their full path.

Closes bazelbuild#23331.

PiperOrigin-RevId: 683362776
Change-Id: I1eb79217dcd53cef0089d62a7ba477b1d8f52c21

(cherry picked from commit 7407cef)
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 9, 2024
Adds support for spaces and newlines in source and target paths of runfiles symlinks to `build-runfiles` as well as to the Bash, C++, and Java runfiles libraries (the Python runfiles library has moved out of the repo).

If a symlink has spaces or newlines in its source path, it is prefixed with a space in the manifest and spaces, newlines, and backslashes in the source path are escaped with `\s`, `\n`, and `\b` respectively. This scheme has been chosen as it has the following properties:
1. There is no change to the format of manifest lines for entries whose source and target paths don't contain a space. This ensures compatibility with existing runfiles libraries.
2. There is even no change to the format of manifest lines for entries whose target path contains spaces but whose source path does not. These manifests previously failed in `build-runfiles`, but were usable on Windows and supported by the official runfiles libraries. This also ensures that the initialization snippet for the Bash runfiles library doesn't need to change, even when used on Unix with a space in the output base path.
3. The scheme is fully reversible and only depends on the source path, which gives runfiles libraries a choice between reversing the escaping when parsing the manifest (C++, Java) or applying the escaping when searching the manifest (Bash).

Fixes bazelbuild#4327

RELNOTES: Bazel now supports all characters in the rlocation and target paths of runfiles and can be run from workspaces with a space in their full path.

Closes bazelbuild#23331.

PiperOrigin-RevId: 683362776
Change-Id: I1eb79217dcd53cef0089d62a7ba477b1d8f52c21

(cherry picked from commit 7407cef)
github-merge-queue bot pushed a commit that referenced this pull request Oct 9, 2024
(contains the refactor in f311efc to
reduce merge conflicts)

Adds support for spaces and newlines in source and target paths of
runfiles symlinks to `build-runfiles` as well as to the Bash, C++, and
Java runfiles libraries (the Python runfiles library has moved out of
the repo).

If a symlink has spaces or newlines in its source path, it is prefixed
with a space in the manifest and spaces, newlines, and backslashes in
the source path are escaped with `\s`, `\n`, and `\b` respectively. This
scheme has been chosen as it has the following properties:
1. There is no change to the format of manifest lines for entries whose
source and target paths don't contain a space. This ensures
compatibility with existing runfiles libraries.
2. There is even no change to the format of manifest lines for entries
whose target path contains spaces but whose source path does not. These
manifests previously failed in `build-runfiles`, but were usable on
Windows and supported by the official runfiles libraries. This also
ensures that the initialization snippet for the Bash runfiles library
doesn't need to change, even when used on Unix with a space in the
output base path.
3. The scheme is fully reversible and only depends on the source path,
which gives runfiles libraries a choice between reversing the escaping
when parsing the manifest (C++, Java) or applying the escaping when
searching the manifest (Bash).

Fixes #4327

RELNOTES: Bazel now supports all characters in the rlocation and target
paths of runfiles and can be run from workspaces with a space in their
full path.

Closes #23331.

PiperOrigin-RevId: 683362776
Change-Id: I1eb79217dcd53cef0089d62a7ba477b1d8f52c21

(cherry picked from commits
7407cef
and f311efc)

Closes #23715

---------

Co-authored-by: Googler <jhorvitz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions team-Rules-CPP Issues for C++ rules team-Rules-Java Issues for Java rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runfiles: support paths with spaces
6 participants